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^ 1 Introduction 

cn : 

^ ■ Spectacular advances in the understanding of the Big-Bang model of cos- 

i \ mology have been due to increasingly accurate observations of the properties 

of the Cosmic Microwave Background (CMB). The detector sensitivities of 
modern experiments have permitted to measure fluctuations of the CMB 
temperature with such a sensitivity that the contamination of the data by 
astrophysical foreground radiations, rather than by instrumental noise, is be- 
coming the major source of limitation. This will be the case, in particular, for 
OO | the upcoming observations by the Planck mission, to be launched by ESA in 

2008 [Lamarre et al. (2003), Mandolesi et al. (2000), Lamarre et al. (2000)], 
as well as for next generation instruments dedicated to the observation of 
CMB polarisation. 

^> | In this context, the development of data analysis methods dedicated to 

■ identifying and separating foreground contamination from CMB observations 
\ is of the utmost importance for future CMB experiments. In many astrophysi- 

O cal observations indeed, and in particular in the context of CMB experiments, 

' | signals and images contain contributions from several components or sources. 

Some of these sources may be of particular interest (CMB or other astrophys- 
\ ical emission), some may be unwanted (noise). 

■ Obviously, components cannot be properly studied in data sets in which 
| they appear only as a mixture. Component separation consists, for each of 

■ ■ »h . them, in isolating the emission from all the other components present in the 

rS ' data, in the best possible way. 

It should be noted that what "best" means depends on what the isolated 
data will be used for. Very often, one tries to obtain, for each component, 
an estimated map (or a set of maps at different frequencies) minimising the 
total error variance, i.e. minimising 

X 2 =^|%)- S (p)| 2 (1) 
p 

where s(p) is the true component emission, and s(p) its estimated value. 
p indexes the space of interest for the component, typically a set of pixels 
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(9 p , (j> p ), or modes (£, m) of a spherical harmonic decomposition of a full sky 
map, or a set of Fourier modes (k xi k y )... 

More generally, the objective of component separation is to estimate a 
set of parameters which describes the component of interest. In the simplest 
case, this set of parameters may be emission in pixels, but it may also be 
instead parameters describing statistical properties such as power spectra, 
spectral indices, etc. . . Since the set of parameters depends on the model 
assumed for the components, this model is of the utmost importance for 
efficient component separation. In the following, a significant part of the 
discussion will thus be dedicated to a summary of existing knowledge and of 
component modeling issues. 

In the following, it is assumed that we are given a set of observations yi(p), 
where i, ranging from 1 to A c hann, indexes the observation frequency. The 
observed emission in each of the frequency bands is supposed to result from 
a mixture of several astrophysical components, with additional instrumental 
noise. 

In this review paper, we discuss in some detail the problem of diffuse com- 
ponent separation. The paper is organised as follows: in the next section, we 
review the principles and implementation of the ILC, a very simple method to 
average the measurements obtained at different frequencies; section 3 reviews 
the known properties of diffuse sky emissions, useful to model the observa- 
tions and put priors on component parameters; section 4 discusses observation 
and data reduction strategies to minimize the impact of foregrounds on CMB 
measurements, based on physical assumptions about the various emissions; 
section 5 discusses the model of a linear mixture, and various options for 
its linear inversion to separate astrophysical components; section 6 discusses 
a non linear solution for inverting a linear mixture, based on a maximum 
entropy method; section 7 presents the general ideas behind Blind Source 
Separation (BSS) and Independent Component Analysis (ICA); section 8 
discusses a particular method, the Spectral Matching ICA (SMICA); section 
10 concludes with a summary, hints about recent and future developments, 
open issues, and a prospective. 

Let the reader be warned beforehand that this review may not do full 
justice to much of the work having been done on this very exciting topic. 
The discussion may be somewhat partial, although not intentionally. It has 
not been possible to the authors to review completely and compare all of the 
relevant work, partly for lack of time, and partly for lack of details in the 
published papers. As much as possible, we have nonetheless tried to mention 
all of the existing work, to comment the ideas behind the methods, and to 
quote most of the interesting and classical papers. 
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2 ILC: Internal Linear Combination 

The Internal Linear Combination (ILC) component separation method as- 
sumes very little about the components. One of them (e.g. the CMB) is 
considered to be the only emission of interest, all the other being unwanted 
foregrounds. 

It is assumed that the template of emission of the component of interest 
is the same at all frequencies of observation, and that the observations are 
calibrated with respect to this component, so that for each frequency channel 
i we have: 

yi(p) = s(p) + fi(p)+rn(p) (2) 

where fi (p) and n, (p) are foregrounds and noise contributions respectively in 
channel i. 

A very natural idea, since all the observations actually measure s(p) with 
some error fi{p) +rii(p), consists in averaging all these measurements, giving 
a specific weight u>i to each of them. Then, we look for a solution of the form: 

%>) = ^l w i{p)y*{p) ( 3 ) 

i 

where the weights Wi(p) are chosen to maximize some criterion about the 
reconstructed estimate s(p) of s(p), while keeping the component of interest 
unchanged. This requires in particular that for allp, the sum of the coefficients 
u>i(p) should be equal to 1. 

2.1 Simple ILC 

The simplest version of the ILC consists in minimising the variance a 2 of the 
map s(p) using weights independent of p (so that Wi (p) — Wi independent of 
p), with J2 w i = 1- I R this case, the estimated component is 

s(p) = ^2™iyi(p) 

= s (p) +^2,Wifi{p) + y ^2,w i n i {p). (4) 

i i 

Hence, under the assumption of de-correlation between s(p) and all fore- 
grounds, and between s(p) and all noises, the variance of the error is minimum 
when the variance of the ILC map itself is minimum. 

2.2 ILC implementation 

We now outline a practical implementation of the ILC method. For definite- 
ness (and simplicity), we will assume here that the data is in the form of 
harmonic coefficients s(£, m). The variance of the ILC map is: 
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(T 2 = ^^«) , C(«,m)ifl = w t CM) (5) 

e>l m=-i 

where C(£,m) = (y(£,m)y' (£,m)) is the covariance matrix of the observa- 
tions in mode (£, m), and C is the covariance summed over all modes except 
£ = 0. y(£, to) and w stand for the vectors of generic element yi{£, m) and «/; 
respectively. The minimum, under the constraint of^uii = 1, is obtained, 
using the Lagrange multiplier method, by solving the linear system 



v d 

dwi 



(t2 + a ( 1 -E w «)] =° 

5> 4 = 1 (6) 



Straightforward linear algebra gives the solution 

Wl = S-[c-i].. (7) 

Note that if the template of emission of the component of interest is the same 
at all frequencies of observation, but the observations are not calibrated with 
respect to this component, equation 2 becomes: 

y l {p) = a l s{p) + fi(p) + m(p) (8) 

In this case, it is still possible to implement an ILC. The solution is 

where A is the vector of recalibration coefficients o^. This solution of equation 
9 is equivalent to first changing the units in all the observations to make the 
response 1 in all channels, and then implementing the solution of equation 7. 



2.3 Examples of ILC separation: particular cases 

This idea of ILC is quite natural. It has, however, several unpleasant features, 
which makes it non-optimal in most real-case situations. Before discussing 
this, let us examine now what happens in two simple particular cases. 

Case 1: Noisy observations with no foreground 

If there are no foregrounds, and the observations are simply noisy maps of 
s(p), with independent noise for all channels of observation, the ILC solution 
should lead to a noise- weighted average of the measurements. 

Let us assume for simplicity that we have two noisy observations, y\ and 
?/2, with yi — s + ni. In the limit of very large maps, so that cross correlations 
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between s, ri\ and n 2 vanish, the covariance matrix of the observations takes 
the form: 

~S + N! S 

S S + N 2 



c 



where S is the variance of the signal (map) of interest, and Ni and N 2 the 
noise variances for the two channels. The inverse of C is: 



C 



1 



dct(C) 



S + N 2 -S 
-S S + m 



and applying equation 7, we get wi — N 2 /(N\ + N2) and w 2 = Ni/(Ni+N 2 ). 
This is the same solution as weighting each map i proportionally to 1/iVj. 

Case 2: Noiseless observations with foregrounds 

Let us now examine the opposite extreme, where observations are noiseless 
linear mixtures of several astrophysical components. Consider the case of 
two components, with two observations. We can write the observations as 
y = As, where A is the so-called "mixing matrix", and s = (si,s 2 )^ the 
vector of sources. 

The covariance of the observations is 



and its inverse is 



c = yy^ = a ss^A* 
cr 1 = [A^ 1 [sst]-^- 1 



(10) 



Let us assume that we are interested in the first source. The data are then 
calibrated so that the mixing matrix A and its inverse arc of the form 



1 a 12 
1 a 22 



and 



1 



det(A) 



022 — Ol2 
-1 1 



Then, if we assume that components 1 and 2 are uncorrelated, equation 10 
yields 

1 



C" 1 



(dot (A)) 2 



a 22 -1 




"Sf 1 




a 22 — a\ 2 


-012 1 




s 2 \ 




-1 1 



(11) 



where S\ and are the variances of components 1 and 2 respectively. After 
expansion of the matrix product, we get: 



C 



1 



(det(A)) 2 
and using equation 7, we get 



(-a 22 a 12 S^ - S2 1 ) (a 2 12 S^ + S^ 1 ) 



w = 



1 

(a 22 - a 12 ) 



a 22 
-a 12 



(12) 



(13) 
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which is the transpose of the first line of matrix A~\ so that si = w.y = 
S\. As expected, if the covariance of the two components vanishes, the ILC 
solution is equivalent, for the component of interest, to what is obtained by 
inversion of the mixing matrix. 

What happens now if the two components are correlated? Instead of the 
diagonal form diag(5i, S2), the covariance matrix of the sources contains an 
off-diagonal term S12, so that equation 11 becomes: 



cr 1 = 



1 



(det(A)) 2 (S1S2 - S? 2 ) 



022 -1 




' s 2 


S12 




a 22 ~ a 12 


_-a 12 1 




S12 






-1 1 



which yields the solution 



w 



(a 2 2 - 012) 



^22 + S12/ S2 

— a\2 — S12/ S2 



(14) 



(15) 



The ILC is not equivalent anymore to the inversion of the mixing matrix A. 
Instead, the estimate si of Si is: 



~ S12 
si = w.y = si- -£-S2 

<-?2 



(16) 



The ILC result is biased, giving a solution in which a fraction of s 2 is sub- 
tracted erroneously, in proportion to the correlation between si and s 2 , to 
lower as much as possible the variance of the output map. The implication 
of this is discussed in paragraph 2.6. 



2.4 Improving the ILC method 

With the exception of the CMB, diffuse sky emissions are known to be very 
non stationnary (e.g. galactic foregrounds are strongly concentrated in the 
galactic plane). In addition, most of the power is concentrated on large scales 
(the emissions are strongly correlated spatially). As the ILC method mini- 
mizes the total variance of the ILC map (the integrated power from all scales, 
as can be seen in equation 5), the weights u>i are strongly constrained essen- 
tially by regions of the sky close to the galactic plane, where the emission 
is strong, and by large scales, which contain most of the power. In addition, 
the ILC method finds weights resulting from a compromise between reducing 
astrophysical foreground contamination, and reducing the noise contribution. 
In other words, for a smaller variance of the output map, it pays off more to 
reduce the galactic contamination in the galactic plane and on large scales, 
where it is strong, rather than at high galactic latitude and on small scales, 
where there is little power anyway. This particularity of the ILC when imple- 
mented globally is quite annoying for CMB studies, for which all scales are 
interesting, and essentially the high galactic latitude data is useful. 

Away from the galactic plane and on small scales, the best linar combina- 
tion for cleaning the CMB from foregrounds and noise may be very different 
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from what it is close to the galactic plane and on large scales. A very natural 
idea to improve on the ILC is to decompose sky maps in several regions and/or 
scales, and apply an ILC independently to all these maps. The final map is 
obtained by adding-up all the ILC maps obtained independently in various 
regions and at different scales. Applications of these ideas are discussed in 
the next paragraph. 

2.5 ILC-based foreground-cleaned CMB map from WMAP data 

A map of CMB anisotropies has been obtained using the ILC method 
[Bennett et al. (2003)] on first year data from the WMAP mission, and has 
been released to the scientific community as part of the first year WMAP 
data products. 

The input data is the set of five all sky, band averaged maps for the K, 
Ka, Q, V and W frequency bands, all of which smoothed to the same 1 degree 
resolution for convenience. The ILC is performed independently in 12 regions, 
11 of which being in the WMAP kp2 mask at low galactic latitudes, designed 
to mask out regions of the sky highly contaminated by galactic foregrounds. 
This division into twelve regions is justified by the poor performance of the 
ILC on the full sky, interpreted as due to varying spectral indices of the 
astrophysical foregrounds. Discontinuities between the regions are reduced 
by using smooth transitions between the regions. 

Little detail is provided on the actual implementation of the ILC by the 
WMAP team. Apparently, a non-linear iterative minimization algorithm was 
used, instead of the linear solution outlined in paragraph 2.2. Although there 
does not seem to be any particular reason for this choice, in principle the 
particular method chosen to minimize the variance does not matter, as long as 
it finds the minimum efficiently. There seem to be, however, indications that 
the convergence was not perfect, as discussed by Eriksen and collaborators 
in a paper discussing the ILC and comparing the results of the several of 
its implementations on WMAP data [Eriksen et al. (2004)]. Caution should 
probably be taken when using the WMAP ILC map for any purpose other 
than a visual impression of the CMB. 

Tegmark and collaborators have improved the ILC method in several re- 
spects, and provide an independent CMB map obtained from WMAP data 
by ILC [Tegmark et al. (2003)]. Their implementation allows the weights to 
depend not only on the region of the sky, but also on angular scale, as dis- 
cussed in paragraph 2.4. In addition, they partially deconvolve the WMAP 
maps in harmonic space to put them all to the angular resolution of the chan- 
nel with the smallest beam, rather than smoothing all maps to put them all 
to the angular resolution of the channel with the largest beam. As a result, 
their ILC map has better angular resolution, but higher total noise. The high 
resolution map, however, can be filtered using a Wiener filter for minimal 
variance of the error. The Wiener-filtered map is obtained by multiplying 
each aim mode of the map by a factor 
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W(t,m) = Ce/S t 

where Ce is the estimated CMB power spectrum (computed for the cosmo- 
logical model fitting best the WMAP data estimate) , and St is the estimated 
power spectrum of the noisy CMB map obtained by the authors using their 
ILC method. 

The CMB map obtained by the WMAP team from first year data is shown 
in figure 1. For comparison, the map obtained by Tegmark et al. is shown in 
figure 2. Both give a good visual perception of what the CMB field looks like. 

2.6 Comments about the ILC 

The ILC has been used essentially to obtain a clean map of CMB emission. In 
principle, nothing prevents using it also for obtaining cleaned maps of other 
emissions, with the caveats that 

The data must be calibrated with respect to the emission of interest, so 
that the data takes the form of equation 2. This implies that the template 
of emission of the component of interest should not change significantly 
with the frequency-band of observation. This is the case for the CMB 
(temperature and polarisation), or for the SZ effect (to first order at 
least... more on this later). 

The component of interest should not be correlated with other compo- 
nents. Galactic components, being all strongly concentrated in the galactic 
plane, can thus not be recovered reliably with the ILC. 

This issue of decorrelation of the component of interest s(p) and the fore- 
grounds can also generate problems in cases where the empirical correlation 
between the components does not vanish. As demonstrated in 2.3, the ILC 
method will not work properly (biasing the result) if the assumption the 
component of interest s(p) are correlated for whatever reason. In particular, 
small data sets, even if they are realisations of actually uncorrelated random 
processes, are always empirically correlated to some level. For this reason, 
the ILC should not be implemented independently on too small subsets of 
the original data (very small regions, very few modes). 

Finally, whereas the ILC is a powerful tool when nothing is known about 
the data, it is certainly non optimal when prior information is available. 
Foreground emissions are discussed in some detail in following section. 

3 Sky emission model: components 

"Know your enemy"... This statement, borrowed from elementary military 
wisdom, applies equally well in the fight against foreground contamination. 
Prior knowledge about astrophysical components indeed has been widely used 
in all practical CMB data analyses. Methods can then be specifically tailored 
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ILC map by the WMAP team 




Fig. 1. The ILC map of the CMB obtained by the WMAP team (one year data). 
Residuals of galactic emission are clearly visible in the center of the map. The color 
scale spans a range of -300 to +300 fiK thermodynamic, although localised residuals 
exceed these values. 

ILC map by Tegmark et al., Wiener filtered 




Fig. 2. The foreground-cleaned CMB map of Tegmark et al., obtained by the ILC 
method described in [Tegmark et al. (2003)], after Wiener filtering. The effect of 
the region and scale-dependent weighting can be seen in the center of the map 
(galactic center) where the map looks smoother and flatter than elsewhere. The 
color scale spans a range of -300 to +300 fiK, although localised residuals exceed 
these values, as in figure 1. The superior angular resolution can clearly be seen. 
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to remove foregrounds based on their physical properties, in particular their 
morphology, their localisation, and their frequency scaling based on the phys- 
ical understanding of their emission mechanisms. 

In addition to knowledge about the unwanted foregrounds, prior knowl- 
edge about the component of interest is of the utmost importance for its 
identification and separation in observations. In the ILC method discussed 
above, for instance, the prior knowledge of the emission law of the CMB 
(derivative of a blackbody) is specifically used. 

3.1 The various astrophysical emissions 

Astrophysical emissions relevant to the framework of CMB observations can 
be classified in three large categories (in addition to the CMB itself). Diffuse 
galactic emission, extragalactic emission, and solar system emission. 

Diffuse galactic emissions originate from the local interstellar medium 
(ISM) in our own galaxy. The ISM is constituted of cold clouds of molecular 
or atomic gas, of an intercloud medium which can be partly ionised, and of 
hot ionized regions presumably formed by supernovae. These different media 
are strongly concentrated in the galactic plane. The intensity of correspond- 
ing emissions decreases with galactic latitude with a cosecant law behaviour 
(the optical depth of the emitting material scales proportionnally to 1/ sin b) . 
Energetic free electrons spiralling in the galactic magnetic field generate syn- 
chrotron emission, which is the major foreground at low frequencies (below 
a few tens of GHz). Warm ionised material emits free-free (Bremstrahlung) 
emission, due to the interaction of free electrons with positively charged nu- 
clei. Small particles of matter (dust grains and macromolecules) emit radia- 
tion as well, through thermal greybody emission, and possibly through other 
mechanisms. 

Extragalactic emissions arise from a large background of resolved and 
unresolved radio and infrared galaxies, as well as clusters of galaxies. The 
thermal and kinetic Sunyaev-Zel'dovich effects, due to the inverse Compton 
scattering of CMB photons off hot electron gas in ionized media, are of special 
interest for cosmology. These effects occur, in particular, towards clusters of 
galaxies, which are known to comprise a hot (few keV) electron gas. Infrared 
and radiogalaxies emit also significant radiation in the frequency domain of 
interest for CMB observations, and contribute both point source emission 
from nearby bright objects, and a diffuse background due to the integrated 
emission of a large number of unresolved sources, too faint to be detected 
individually, but which contribute sky background inhomogeneities which 
may pollute CMB observations. 

Solar system emission comprises emissions from the planets, their satel- 
lites, and a large number of small objects (asteroids). In addition to those, 
there is diffuse emission due to dust particles and grains in the ecliptic plane 
(zodiacal light). The latter is significant essentially at the highest frequencies 
of an instrument like the Planck HFI [Maris et al. (2006)]. 
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In the rest of this section, we briefly outline the general properties of 
these components and the modeling of their emission in the centimetre to 
sub-millimetre wavelength range. 

3.2 The Cosmic Microwave Background 

The cosmic microwave background, relic radiation from the hot big bang 
emitted at the time of decoupling when the Universe was about 370,000 years 
old, is usually thought of (by cosmologists) as the component of interest in 
the sky emission mixture. Millimetre and submillimctre wave observations, 
however, sometimes aim not only at measuring CMB anisotropics, but also 
other emissions. In this case, the CMB becomes a noxious background which 
has to be subtracted out of the observations, just as any other. 

The CMB emission is relatively well known already. The main theo- 
retical framework of CMB emission can be found in any modern text- 
book on cosmology, as well as in several reviews [Hu & Dodelson (2002), 
White & Cohn (2002)]. The achievement of local thermal equilibrium in the 
primordial plasma before decoupling, together with the very low level of the 
perturbations, guaranties that CMB anisotropies are properly described as 
the product of a spatial template AT{p) — Tqmb(p) ~ Tqmb, and a function 
of v (frequency scaling) which is the derivative of a blackbody with respect 
to temperature: 



In the standard cosmological model, the CMB temperature fluctuation map 
AT(p) is expected to be a realisation of a stationary Gaussian random field, 
with a power spectrum d displaying a series of peaks and troughs (the 
acoustic peaks), the location and relative size of which are determined by a 
few free parameters of the cosmological model. 3 

Good maps of sky emission at a resolution of about 15 arcminutes, ob- 
tained from WMAP data in the frequency range 20-90 GHz, clearly com- 
prise at high galactic latitude an astrophysical component compatible with all 
these predictions. The power spectrum is measured with excellent accuracy by 
WMAP up to the second Doppler peak, while complementary balloon-borne 
and ground-based experiments yield additional measurements at higher I 
(smaller scales). 

Efficient diffuse component separation methods should make use of this 
current status of knowledge about the CMB: 

The power spectrum Ce is defined as the set of variances of the coefficients ae m 
of the expansion of the random field representing CMB relative temperature 
fluctuations AT(p)/TcMS onto the basis of spherical harmonics on the sphere 
Ye m (6, <j>). The stationarity and isotropy of the random field guarantees that the 
variance of ae m (coefficients Ce) is independent of m. 



AI v {p) = AT CMB (p) 



~ dB u {T) ~ 
dT 



J T=Tcmb^2.726K 



(17) 
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Law of emission, known to a high level of precision to be the derivative of 
a blackbody with respect to temperature, as expected theoretically and 
checked experimentally with the Boomerang [de Bernardis et al. (2000)] 
and Archeops [Tristram et al. (2005)] multifrequency data sets, as well as 
with the WMAP data [Bennett et al. (2003), Patanchon et al. (2005)] 
Stationnarity and gaussianity to a high level of accuracy, as expected 
theoretically and checked on WMAP data [Komatsu et al. (2003)] 
Good cosmological prior on the power spectrum of the fluctuations, val- 
idated experimentally with several data sets [Netterfield et al. (2002), 
Hinshaw et al. (2006)] 

A good visual impression of all-sky CMB emission is given in figures 1 
and 2. The present status of knowledge of the power spectrum Cg is shown 
in figure 3. 



Angular scale 
90° 2° 0.5° 0.2° 
6000 P ' ' r_ 




E i i i i i I i i i i I I 

10 100 500 1000 1500 

Multipole moment I 



Fig. 3. Present-day best constraints of the CMB temperature power spec- 
trum (from [Hinshaw et al. (2006)]). Data sets in addition to WMAP 3-year 
data are from [Jones et al. (2005), Kuo et al. (2004), Readhead et al. (2004), 
Dickinson et al. (2004)] 
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The extraction of CMB emission from a set of multifrequency observations 
may be done with the following objectives in mind (at least): 

Get the best possible map of the CMB (in terms of total least square 
error, from noise and foregrounds together); 

Get the CMB map with the least possible foreground contamination; 
Get the CMB map for which spurious non-gaussianity from foregrounds, 
noise and systematic effects is minimal; 

Get the best possible estimate of the CMB angular power spectrum... 

Obviously, the best component separation method for extracting the CMB 
will depend on which of the above is the primary objective of component 
separation. 

3.3 Emissions from the diffuse interstellar medium 

Synchrotron emission 

Synchrotron emission arises from energetic charged particles spiralling in a 
magnetic field. In our galaxy, such magnetic fields extend outside the galactic 
plane. Energetic electrons originating from supernovae shocks, spiraling in 
this field, can depart the galactic plane and generate emission even at high 
galactic latitudes. For this reason, synchrotron emission is less concentrated 
in the galactic plane than free-free and dust. 

The frequency scaling of synchrotron emission is a function of the distri- 
bution of the energies of the radiating electrons. For number density distri- 
butions N{E) oc _E~ 7 , the flux emission is also in the form of a power law, 
I v oc v^ a 1 with a = (7— 1)/2. In Rayleigh- Jeans (RJ) temperature AT oc 
with (3 = a + 2. Typically, (3 ranges from 2.5 to 3.1, and is somewhat variable 
across the sky. 

In spite of a moderate sensitivity for current standards, the 408 MHz all 
sky map [Haslam et al. (1981)], dominated by synchrotron emission, gives a 
good visual impression of the distribution of synchrotron over the sky. 

In principle, synchrotron emission can be highly polarised, up to 50-70% 

Free-Free emission 

Free-free emission is the least well known observationally of the three major 
emissions originating from the galactic interstellar medium in the millimetre 
and centimetre wavelength range. This emission arises from the interaction of 
free electrons with ions in ionised media, and is called "free-free" because of 
the unbound state of the incoming and outgoing electron. Alternatively, free- 
free is called "Bremsstrahlung" emission ("braking radiation" in German), 
because photons are emitted while electrons loose energy by interaction with 
the heavy ions. 
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Fig. 4. The 408 MHz all-sky synchrotron map [Haslam et al. (1981)]. Data and 
images are available on the NASA Lambda web site. 



Theoretical calculations of free-free emission in an electrically neutral 
medium consisting of ions and electrons gives an estimate of the brightness 
temperature at frequency v for free-free emission of the form: 

T ff ~ 0.08235 T" 35 v~ fi j NeN.dl (18) 

Ao.s. 

where T e is in Kelvin, v is in GHz and the line of sight integral of electron and 
ion density in cm _6 pc [Smoot (1998)]. Theoretical estimates of the spectral 
index, (3, range from about 2.1 to 2.15, with errors of ±0.03. 

While free- free emission is not observed directly, as it never dominates over 
other astrophysical emissions, the source of its emission (mainly ionised hy- 
drogen clouds) can be traced with hydrogen recombination emission lines, and 
particularly Ha emission. The connection between Ha and free-free has been 
discussed extensively by a number of authors [Smoot (1998), Valls-Gabaud (1998), 
McCullough et al. (1999)]. We have: 

~ 10.4 v- 2 - 14 T 4 527 10 029 /^ (1 + 0.08) (19) 

Iq]RJ 

Where Tff[mK] is the free- free brightness temperature in mK, I a [R] the Ha 
surface brightness in Rayleigh, v the frequency, and T4 the temperature of 
the ionized medium in units of 10 4 K. The Rayleigh (R) is defined as 1 R = 
(10 6 /4tt) photons/cm 2 /s/sr. 

Free-free emission, being due to incoherent emissions from individual elec- 
trons scattered by nuclei in a partially ionised medium, is not polarised (to 
first order at least). 
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Thermal emission of galactic dust 

The present knowledge of interstellar dust is based on extinction observations 
from the near infrared to the UV domain, and on observations of its emission 
from radio frequencies to the infrared domain. 

Dust consists in small particles of various materials, essentially silicate and 
carbonaceous grains of various sizes and shapes, in amorphous or crystalline 
form, sometimes in aggregates or composites. Dust is thought to comprise 
also large molecules of polycyclic aromatic hydrocarbon (PAH). The sizes 
of the grains range from few nanometers for the smallest, to micrometers 
for the largest. They can emit through a variety of mechanisms. The most 
important for CMB observations is greybody emission in the far infrared, at 
wavelengths ranging from few hundreds of microns to few millimeters. The 
greybody emission is typically characterised by a temperature Td us t and by 
an emissivity proportional to a power of the frequency v. 

I v cx u' 3 B u {T dust ) 

where B V {T) is the usual blackbody emission 

MT)—^^ (21) 

This law is essentially empirical. In practice, dust clouds along the line of 
sight can have different temperatures and different compositions: bigger or 
smaller grains, different materials. They can thus have different emissivities 
as well. Temperatures for interstellar dust are expected to range from about 
5 Kelvin to more than 30 Kelvin, depending on the heating of the medium 
by radiation from nearby stars, with typical values of 16-18 K for emissivity 
indices (3 ~ 2. 

In principle, thermal emission from galactic dust should not be strongly 
polarised, unless dust particles are significantly asymmetric (oblate or pro- 
late) , and there exists an efficient process for aligning the dust grains in order 
to create a significant statistical asymmetry. Preliminary dust observations 
with the Archeops instrument [Benoit et al. (2004), Ponthieu et al. (2005)] 
seems to indicate polarisation levels of the order of few per cent, and as high 
as 15-20 per cent in a few specific regions. 

Spinning dust or anomalous dust emission 

In the last years, increasing evidence for dust-correlated emissions at fre- 
quencies below 30 GHz, in excess to expectations from synchrotron and 
free-free, has aroused interest in possible non-thermal emissions from galac- 
tic dust [Kogut et al. (1996), Lcitch et al. (1997)]. Among the possible non- 
thermal emission mechanisms, spinning dust grains offer an interesting option 
[Draine & Lazarian (1998)]. 



(20) 
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At present, there still is some controversy on whether the evidence for 
non-thermal dust emission is robust enough for an unambiguous statement. 
Observations of different sky regions, indeed, yield somewhat different results 
[Dickinson et al. (2006), Fernandez-Cerezo et al. (2006)], which may be due 
either to varying dust cloud properties, or to differences in the analyses and 
interpretations, or both. Certainly, investigating this question is an objec- 
tive of primary interest for diffuse component separation methods (especially 
blind ones) in the near future. 

3.4 The SZ effect 

The Sunyaev Zel'dovich (SZ) effect [Sunyaev & Zeldovich (1972)] is the in- 
verse Compton scattering of CMB photons on free electrons in ionised media. 
In this process, the electron gives a fraction of its energy to the scattered CMB 
photon. There are, in fact, several SZ effects: The thermal SZ effect is due to 
the scattering of photons on a high temperature electron gas, such as can be 
found in clusters of galaxies. The kinetic SZ effect is due to the scattering on 
a number of electrons with a global radial bulk motion with respect to the 
cosmic background. Finally, the polarised SZ effect is a second order effect 
due to the kinematic quadrupole of the CMB in the frame of an ensemble of 
electrons with a global transverse bulk motion with respect to the CMB. 

SZ effects arc not necessarily linked to clusters of galaxies. Any large 
body with hot ionised gas can generate significant effects. It has been pro- 
posed that signatures of inhomogeneous reionisation can be found via the 
kinetic and thermal SZ effect [Aghanim et al. (1996), Gruzinov & Hu (1998), 
Yamada et al. (1999)]. However, the largest expected SZ signatures originate 
from ionised intra-cluster medium. 

Clusters of galaxies 

Clusters of galaxies, the largest known massive structures in the Universe, 
form by gravitational collapse of matter density inhomogeneitics on large 
scales (comoving scales of few Mpc). They can be detected either optically 
from concentrations of galaxies at the same redshift, or in the submillimeter 
by their thermal SZ emission, or by the effect of their gravitational mass in 
weak shear maps, or in X-ray. The hot intracluster baryonic gas can be ob- 
served through its X-ray emission due to Bremsstrahlung (free-free) emission 
of the electrons on the nuclei, which permits to measure the electron temper- 
ature (typically a few keV). On the sky, typical cluster angular sizes range 
from about one arcminute to about one degree. Clusters are scattered over 
the whole sky, although this distribution follows the repartition of structure 
on the largest scales in the universe. Large scale SZ effect observations may 
be also used to survey the distribution of hot gas on these very large scales, 
although such SZ emission, from filaments and pancakes in the distribution, is 
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expected to be at least an order of magnitude lower in intensity than thermal 
SZ emission from the clusters themselves. 

Each cluster of galaxies has its own thermal, kinetic and polarised SZ emis- 
sion. These various emissions and their impact on CMB observations and for 
cosmology have been studied by a variety of authors. Useful reviews have been 
made by Birkinshaw [Birkinshaw (1999)] and Rephaeli [Rephaeli (2002)], for 
instance. 



Thermal SZ 



The thermal SZ effect generated by a gas of electrons at temperature T e is, in 
fact, a spectral distortion of the CMB emission law. It is common to consider 
as the effect the difference Al v — l v — B„(Tcmb) between the distorted CMB 
photon distribution I v and the original one B„(Tcmb)- In the non-relativistic 
limit (when T e is lower than about 5 keV, which is the case for most clusters), 
the shape of the spectral distortion does not depend on the temperature. The 
change in intensity due to the thermal SZ effect is: 



A T Xe 



(e x ~ 1) 



x(e x + 1) 



BATcmb) (22) 



(e x - 1) 

where B u (Tcmb) is the Planck blackbody emission law at CMB temperature 

2hv z 1 



B v (Tcmb.) = 

and x = hv/kTcMB- The dimensionless parameter y (Comptonisation pa- 
rameter) is proportional to the integral of the electron pressure along the line 
of sight: 

f kT - ,n 

V = ^«eOthomsonUf 

Jlos m e c 2 

where T e is the electron temperature, m e the electron mass, c the speed of 
light, n e the electron density, and crthomson the Thomson cross section. 

Kinetic SZ 



The kinetic SZ effect is generated by the scattering of CMB photons off 
an electron gas in motion with respect to the CMB. This motion generates 
spectral distortions with the same frequency scaling as CMB temperature 
fluctuations, and are directly proportionnal to the velocity of the electrons 
along the line of sight. As the effect has the same frequency scaling as CMB 
temperature fluctuations, it is, in principle, indistiguishable from primordial 
CMB. However, since the effect arises in clusters of galaxies with typical sizes 
1 arcminute, it can be distinguished to some level from the primordial CMB 
by spatial filtering, especially if the location of the clusters most likely to 
generate the effect is known from other information (e.g. the detection of the 
clusters through the thermal SZ effect). 
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Polarised SZ 

The polarised SZ effect arises from the polarisation dependence of the Thom- 
son cross section: 

o~t oc |ei.e2| 

where e\ and e2 are the polarisation states of the incoming and out- 
going photon respectively. A quadrupole moment in the CMB radiation 
illuminating the cluster electron gas generates a net polarisation, at a 
level typically two orders of magnitude lower than the kinetic SZ effect 
[Sunyaev & Zeldovich (1980), Audit & Simmons (1999), Sazonov & Sunyaev 
Therefore, the kinetic SZ effect has been proposed as a probe to investigate 
the dependence of the CMB quadrupole with position in space. Cluster trans- 
verse motions at relativistic speed, however, generate also such an effect from 
the kinematic quadrupole induced by the motion. Multiple scattering of CMB 
photons also generates a low-level polarisation signal towards clusters. 

The polarised SZ effects has a distinctive frequency scaling, independent 
(to first order) to cluster parameters and to the amplitude of the effect. 
Amplitudes are proportionnal: 

- to t for the intrinsic CMB quadrupole effect, 

- to (v t /c) 2 T for the kinematic quadrupole effect 

to (fcT e /m e c 2 )r 2 and (v t /c)T 2 for polarisation effects due to double scat- 
tering. 

Here r is the optical depth, v t the transverse velocity, c the speed of light, k 
the Boltzmann constant, and T e and m e the electron temperature and mass. 

As polarised effects arise essentially in galaxy clusters, they can be sought 
essentially in places where the much stronger thermal effect is detected, which 
will permit to improve the detection capability significantly. Polarised SZ 
emission, however, is weak enough that it is not expected to impact signifi- 
cantly the observation of any of the main polarised emissions. 

Diffuse component or point source methods for SZ effect separation? 

The SZ effect is particular in several respects. As most of the emission comes 
from compact regions towards clusters of galaxies (at arcminute scales), most 
of the present-day CMB experiments do not resolve clusters individually 
(apart for a few known extended clusters). For this reason, it seems natu- 
ral to use point source detection methods for cluster detection (see review by 
Barrciro [Barrciro (2005)]). However, the very specific spectral signature, the 
presence of a possibly large background of clusters with emission too weak for 
individual cluster detection, and the interesting possibility to detect larger 
scale diffuse SZ emission, makes looking for SZ effect with diffuse component 
separation methods an interesting option. 



Diffuse source separation in CMB observations 



19 



3.5 The infrared background of unresolved sources 

The added emissions from numerous unresolved infrared sources at high red- 
shift make a diffuse infrared background, detected originally in the FIRAS 
and DIRBE data [Puget et al. (1996)]. Because each source has its specific 
emission law, and because this emission law is redshifted by the cosmological 
expansion, the background does not have a very well defined frequency scal- 
ing. It appears thus, in the observations at various frequencies, as an excess 
emission correlated between channels. The fluctuations of this background 
are expected to be significant at high galactic latitudes (where not masked 
by much stronger emissions from our own galaxy), and essentially at high 
frequencies (in the highest frequency channels of the Planck HFI. 

3.6 Point sources 

The "point sources" component comprises all emissions from astrophysical 
objects such as radio galaxies, infrared galaxies, quasars, which are not re- 
solved by the instruments used in CMB observations. For such sources, the 
issues are both their detection and the estimation of parameters describ- 
ing them (flux at various frequencies, location, polarisation...), and specific 
methods are devised for this purpose. For diffuse component separation, they 
constitute a source of trouble. Usually, pixels contaminated by significant 
point source emission are blanked for diffuse component separation. 

4 Reduction of foreground contamination 

The simplest way of avoiding foreground contamination consists in using prior 
information on emissions to reduce their impact on the data: by adequate 
selection of the region of observation, by masking some directions in the sky, 
by choosing the frequency bands of the instrument, or, finally, by subtracting 
an estimate of the contamination. All of these methods have been used widely 
in the context of CMB experiments. 

4.1 Selection of the region of observation 

Perhaps the most obvious solution to avoid contamination by foregrounds is 
to design the observations in such a way that the contamination is minimal. 
This sensible strategy has been adopted by ground-based and balloon-borne 
experiments observing only a part of the sky. In this case, CMB observations 
are made away from the galactic plane, in regions where foreground contam- 
ination from the galactic emissions is known to be small. The actual choice 
of regions of observation may be based on existing observations of dust and 
synchrotron emission at higher and lower frequencies, picking those regions 
where the emission of these foregrounds is known to be the lowest. 
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The drawback of this strategy is that the observations do not permit 
to estimate very well the level of contamination, nor the properties of the 
foregrounds. 

4.2 Masking 

For all-sky experiments, a strategy for keeping the contamination of CMB ob- 
servations by foregrounds consists in masking regions suspected to comprise 
significant foreground emissions, and deriving CMB properties (in particu- 
lar the CMB power spectrum) in the "clean" region. The drawback of this 
strategy is that sky maps are incomplete. 

Typically, for CMB observations, pixels contaminated by strong point 
sources (radio and infrared galaxies) are blanked, as well as a region contain- 
ing the galactic plane. Such masks have been used in the analysis of WMAP 
data. 

4.3 Selection of the frequency bands of the instrument 

Of course, the selection of the frequency of observation to minimize the overall 
foreground contamination is a sensible option. For this reason, many CMB 
experiments aim at observing the sky around 70-100 GHz. Ground-based 
observations, however, need to take into account the additionnal foreground of 
atmospheric emission, which leaves as best windows frequency bands around 
30 GHz, 90 GHz, 150 GHz, and 240 GHz. 

Figure 5 shows the expected typical frequency scalings for the major dif- 
fuse emission astrophysical components, including the CMB. For efficient 
component separation, CMB experiment, ideally, should comprise two or 
three channels around 70-100 GHz where CMB dominates, one channel 
around 217 GHz (the zero of the SZ effect), two channels at higher frequencies 
to monitor dust emission, and 3-4 channels at lower frequencies to monitor 
low frequency foregrounds. 

Below 100 GHz, the present state of the art technology suggests the use 
of radiometers with high electron mobility transistor (HEMT) amplifiers, 
whereas above 100 GHz, low temperature bolometers provide a significantly 
better sensitivity that any other techniques. Typically, a single experiment 
uses one technology only. For Planck specifically, two different instruments 
have been designed to cover all the frequency range from 30 to 850 GHz. 

4.4 Foreground cleaning 

As a refinement to the above simple observational strategies, a first-order 
estimate of foreground contamination, based on observations made at low 
and high frequencies, can be subtracted from the observations. Depending on 
the accuracy of the model, the overall level of contamination can be reduced 
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Fig. 5. The frequency scaling of CMB and most relevant diffuse foregrounds, in 
Rayleigh- Jeans temperature, between 10 GHz and 1 THz. Depending on the rela- 
tive amplitude of synchrotron, bremsstrahlung and dust emissions, the minimum 
of galactic foregrounds is somewhat below 100 GHz. Free-free emission decreases 
roughly as v~ 21 and synchrotron as v~ z , while dust insreases as v 2 . The SZ effect 
is the major emission towards rich clusters, but is very localised. The thickness 
of the bands illustrates uncertainties as to the level of foregrounds, as well as un- 
certainties in the frequency scaling for synchrotron, free-free and dust emissions. 
Anomalous dust emission is not represented, due to our present lack of knowledge 
of the existence and nature of such a component. 



by a factor of a few at least, which permits to reduce the amount of cut 
sky. This strategy, in particular, has been used by the WMAP team for the 
analysis of first year WMAP data [Bennett et al. (2003)]. 

Observations at low frequencies (10-40 GHz) can be used to map syn- 
chrotron emission and model its contribution in the 70-100 GHz range. Sim- 
ilar strategies can be used towards the high frequency side to model dust 
emission and subtract its contribution from CMB channels. For this purpose, 
models of emission as good as possible are needed, and the cleaning can be no 
better than the model used. There is always, therefore, a trade-off between a 
sophisticated model with simple correction methods (subtraction of an inter- 
polation, simple decorrelation), and a simple model with sophisticated statis- 
tical treatments (multi- frequency filtering, independent component analysis). 
Which approach is best depends on a number of issues, and the answer is not 
completely clear yet. 
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5 The linear model and system inversion 

The most popular model of the observations for source separation in the 
context of CMB observations probably is the linear mixture. 

In this model, all components arc assumed to have an emission which can 
be decomposed as the product of a spatial template independent of the fre- 
quency of observation, and of a spectral emission law which does not depend 
on the pixel. The total emission at frequency v, in pixel p, of a particular 
emission process j is written as 

x j (v,p)=a(v)s j (p) 

or alternatively in spherical harmonics space, 

Xj(v,£m) = a(v)sj{£m) 

Forgetting for the moment annoying details concerning the response of the 
instrument (beams, frequency bands, etc..) the observation with a detector 
is then: 

Vi(p) = ^Xj{v u p) + n t (p) 
j 

where is the contribution of noise for detector i. For a set of detectors, 

this can be recast in a matrix-vector form as 

y(p) = As(p) + n(p) (23) 

Here, y(p) represent the set of maps observed with all detectors detector, and 
s(j>) are the unobserved components (one template map per astrophysical 
component). The mixing matrix A which does not depend on the pixel for a 
simple linear mixture, has one column per astrophysical component, and one 
line per detector. 

If the observations are given in CMB temperature for all detectors, and 
if the detectors are properly calibrated, each element of the column of the 
mixing matrix corresponding to CMB is equal to 1. 

The problem of component separation consists in inverting the linear sys- 
tem of equation 23. Here we first concentrate on linear inversion, which con- 
sists in finding the "best" possible matrix W (such that s = Wy is "as good 
an estimator of s as possible" ) . 

Covariances and multivariate power spectra 

In the following, a lot of use will be made of second order statistics of var- 
ious sorts of data. In general, for a collection of maps x(p) = {xi(p}}, the 
covariance will be noted as R x (p,p'), the elements of which are: 



Rij(P,p') = cov(xi(p),Xj(p')) 
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Alternatively, in harmonic space, we denote as R x (^) the multivariate power 
spectrum of x, i.e. the collection of matrices 

R x (f) = (x(e,m)x*(e,m)) 

where the brackets (.) denote ensemble average, and the dagger t denotes 
the transpose of the complex conjugate. Such a power spectrum is well de- 
fined only for stationary/isotropic random fields on the sphere for which 
(x(£, m)x\t, m)) does not depend on m. 

5.1 Simple inversion 

If A is square and non singular, in absence of any additional information, 
then the inversion is obtained by 

W = A" 1 (24) 

and we have 

s = A _1 y = s + A~ 1 n (25) 

Note that because of the remaining noise term, this inversion is not always 
the best solution in terms of residual error, in particular in the poor signal 
to noise regimes. For instance, if we have two measurements of a mixture of 
CMB + thermal dust in a clean region of the sky (low foregrounds), one of 
which, at 150 GHz, is relatively clean, and the other, at 350 GHz, quite poor 
because of high level noise, then it may be better to use the 150 GHz as the 
CMB template (even with some dust contamination), rather than to invert 
the system, subtracting little dust and adding a large amount of noise. 

In terms of residual foreground contamination however (if the criterion is 
to reject astrophysical signals, whatever the price to pay in terms of noise), 
the only solution here is matrix inversion. The solution is unbiased, but may 
be noisy. 

Note that an ILC method would produce a different solution, possibly 
slightly biased (as discussed in 2.6), but possibly better in terms of signal to 
noise ratio of the end product. 

This solution can be applied if the full matrix A is known (not only the 
column of the component of interest, i.e. the CMB), without further prior 
knowledge of the data. 

5.2 Inversion of a redundant system using the pseudo inverse 

If there are more observations than components, but nothing is known about 
noise and signal levels, then the inversion is obtained by 



W = [A 1 * A] 1 A f 



(26) 
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and we have 

s = [AtA]" 1 Aty = s+ [AtA]" 1 Atn (27) 

Again, this estimator is unbiased, but may contain a large amount of noise 
and may not be optimal in terms of signal to noise ratio. All the comments 
made in the previous paragraph hold as well for this solution. 

Note that there is no noise- weighting here, so that one single very bad 
channel may contaminate significantly all the data after inversion. It is there- 
fore not a very good idea to apply this estimator with no further thoughts. 

Note that, again, this solution can be implemented without any further 
knowledge about signal and noise - only the entries of the mixing matrix for 
all components are needed. 

5.3 A noise-weighted scheme: the Generalised Least-Square 
solution 

Let us now assume that we know something additional about the noise, 
namely, its second order statistics. These are described by noise correlation 
matrices in real space, or alternatively by noise power spectra in Fourier (for 
small maps) or in harmonic (for all-sky maps) space. 

We denote as R n the noise correlation matrix and assume, for the time 
being, that the noise for each detector i is a realization of a random gaussian 
field, the generalised (or global) least square (GLS) solution of the system of 
equation 23 is: 

W = [AtRn^A]" 1 A f R n 1 (28) 

and we have 

s = [AtR n _1 A] _1 A^R^y = s+ [A^R n _1 A] _1 A^R, n ~ 1 n (29) 

Again, the solution is unbiased. Altough there remains a noise contribution, 
this is the solution yielding the minimum variance error map for a deter- 
ministic signal (in contrast with the Wiener solution below, which optimises 
the variance of the error when the signal is stochastic, i.e. assumed to be a 
random field) . It is also the best linear solution in the limit of large signal to 
noise ratio. 

This solution is also theoretically better than the ILC when the model 
holds, but the price to pay is the need for more prior knowledge about the 
data (knowledge of the mixing matrix and of noise covariance matrices or 
power spectra). If that knowledge is unsufncient, one has to design methods 
to get it from the data itself. Such "model learning" methods will be discussed 
in section 7. 

5.4 The Wiener solution 

The Wiener filter [Wiener (1949)] has originally been designed to filter time 
series in order to suppress noise, but has been extended to a large variety 
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of applications since then. Wiener's solution requires additional information 
regarding the spectral content of the original signal and the noise. Wiener 
filters are characterized by the following: 

Both the noise and the signal are considered as stochastic processes with 
known spectral statistics (or correlation properties) - contrarily to the 
GLS method which considers the noise only to be stochastic, the signal 
being deterministic, 

The optimization criterion is the minimum least square error, 
The solution is linear. 

In signal processing, a data stream y(t) = s(t) + n(t) assumed to be a noisy 
measurement of a signal s can be filtered for denoising as follows: in Fourier 
space, each mode y(f) of the data stream is weighted by a coefficient 

W(f) = s{f) 



S(f) + N(f) 



where S(f) = (|s(/)| 2 ) and N(f) = (\n(f)\ 2 ) are ensemble averages of the 
square moduli of the Fourier coefficients of the stochastic processes s and n. 

In the limit of very small noise level N(f) <C S(f), the Wiener filter value 
is W(f) = 1, and the filter does not change the data. In the limit of very 
poor signal to noise S(f) <C N(f), the filter suppresses the data completely, 
because that mode adds noise to the total data stream, and nothing else. 

It can be shown straightforwardly that the Wiener filter minimizes the 
variance of the error of the signal estimator s(f) = W(f)y(f) (so that 
(// - s (f)\ 2d f) is minimal). 

The Wiener solution can be adapted for solving our component separation 
problem, provided the mixing matrix A and the second order statistics of 
the components and of the noise are known [Tegmark & Efstathiou (1996), 
Bouchct & Gispert (1999)] as: 

W« = [AtRn -1 A + Rr 1 ] _1 AtRn" 1 (30) 

where R s is the correlation matrix of the sources (or power spectra of the 
sources, in the Fourier or harmonic space), and R n the correlation matrix of 
the noise. The superscript (1) is used to distinguish two forms of the Wiener 
filter (the second is given later in this section). 

An interesting aspect of the Wiener filter is that: 

s = [AtRn^A + Rs" 1 ] A^R n _1 y 
= [A^R n _1 A + Rs -1 ] _1 A t R n _1 As 

+ [A t R n ~ 1 A + R a ~ 1 ] _1 A t R n ~ 1 n (31) 



The matrix in front of s is not the identity, and thus the Wiener filter does 
not give an unbiased estimate of the signals of interest. Diagonal terms can be 
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different from unity. In addition, non-diagonal terms may be non-zero, which 
means that the Wiener filter allows some residual foregrounds to be present 
in the final CMB map - the objective being to minimise the variance of the 
residuals, irrespective of whether these residuals originate from instrumental 
noise or from astrophysical foregrounds. 

As noted in [Tegmark & Efstathiou (1996)], the Wiener solution can be 
"debiased" by multiplying the Wiener matrix by a diagonal matrix remov- 
ing the impact of the filtering. The authors argue that for the CMB this 
debiasing is desirable for subsequent power spectrum estimation on the re- 
constructed CMB map. Each mode of a given component is divided by the 
diagonal element of the Wiener matrix for that component and that mode. 
This, however, destroys the minimal variance property of the Wiener solu- 
tion, and can increase the noise very considerably. There is an incompatibility 
between the objective ofobtaining a minimum variance map, and the objec- 
tive of obtaining an unbiased map which can be used directly to measure the 
power spectrum of the CMB. There is no unique method for both. 

Before moving on, it is interesting to check that the matrix form of the 
Wiener filter given here reduces to the usual form when there is one signal 
only and when the matrix A reduces to a scalar equal to unity. In that case, 
the Wiener matrix W of equation 30 reduces to 

W(f) = [1/S(f) + l/N{f)]-'/N{f) = S(f)/[N(f) + S(f)} 

where S and N are the signal and noise power spectra, and we recover the 
classical Wiener formula. 

Two forms of the Wiener Filter 

In the literature, another form can be found for the Wiener filter matrix: 

W (2 > = R s A t [R n + AR s At] _1 (32) 
It can be shown straightforwardly that if the matrices 
Mi = [A t R„ _1 A + R s _1 ] 

and 

M 2 = [R n + AR s At] 

are regular, then the forms of equation 30 and 32 are equivalent (simply 
multiply both forms by Mi on the left and M 2 on the right, and expand). 

It may seem that the form of equation 32 is more convenient, as it requires 
only one matrix inversion instead of three. Each form, however, presents 
specific advantages or drawbacks, which appear clearly in the high signal to 
noise ratio (SNR) limit, and if power spectra of all signals are not known. 
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The high SNR limit 

The two above forms of the Wiener filter are not equivalent in the high SNR 
limit. In this regime, equation 30 yields in the limit 

which is the GLS solution of equation 28, and depends only on the noise 
covariance matrix, whereas equation 32 tends to 

W^ it = R s At [ARsAtf 1 

which depends only on the covariance of the signal. Therefore, some care 
should be taken when applying the Wiener filter in the high SNR ratio regime, 
when numerical roundup errors may cause problems. 
Note that if [At A] is regular, then 

W<& t = R.At [ARsAt]- 1 

= [At A] _1 [At A] R s At [AR s At] ^ (33) 
= [AtA] _1 At (34) 

and the limit is simply the pseudo inverse of matrix A, without any noise 
weighting. Of course, when there is no noise at all, can not be imple- 

mented at all, and the Wiener solution is pointless anyway. 

What if some covariances are not known? 

It is interesting to note that even if the covariance matrix (or equivalcntly 
multivariate power spectrum) R s of all sources is not known, it is still possible 
to implement an approximate Wiener solution if the maps of observations 
are large enough to allow a good estimate of the covariance matrix of the 
observations. 

If y = As + n and if the noise and the components arc independent, the 
covariance R y of the observations is of the form: 

R y = R n + AR s A f 

Therefore, form 2 of the Wiener filter can be recast as: 

W< 2 > =R s At [Ry]" 1 (35) 

If all components are decorrelated, the matrix R s is diagonal. For the imple- 
mentation of a Wiener solution for one single component (e.g. CMB), only 
the diagonal element corresponding to the CMB (i.e. the power spectrum Ci 
of the CMB) is needed, in addition to the multivariate power spectrum of the 
observations R y . The latter can be estimated directly using the observations. 
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5.5 Comment on the various linear inversion solutions 

The above four linear solutions to the inversion of the linear system of equa- 
tion 23 have been presented by order of increasing generality, increasing com- 
plexity, and increasing necessary prior knowledge. The various solutions are 
summarised in table 1. Three comments are necessary. 

Firstly, we note that the Wiener solutions require the prior knowledge 
of the covariance matrices (or equivalcntly power spectra) of both the noise 
and the signal. For CMB studies, however, the measurement of the power 
spectrum of the CMB field is precisely the objective of the observations. 
Then, the question of whether the choice of the prior on the CMB power 
spectrum biases the final result or not is certainly of much relevance. For 
instance, the prior assumption that the power spectrum of the CMB is small 
in some I range will result in filtering the corresponding modes, and the 
visual impression of the recovered CMB will be that indeed there is little 
power at the corresponding scales. For power spectrum estimation on the 
maps, however, this effect can be (an should be) corrected, which is always 
possible as the effective filter induced by the Wiener solution is known (for 
an implementation in harmonic space, it is equal for each mode £m, for each 
component, to the corresponding term of the diagonal of W^ m A). In section 
8, a solution will be proposed for estimating first on the data themselves 
all the relevant statistical information (covariance matrices and frequency 
scalings), and then using this information for recovering maps of the different 
components. 

Secondly, we should emphasise that the choice of a linear solution should 
be made with a particular objective in mind. If the objective is to get the 
best possible map in terms of least square error, then the Wiener solution is 
the best solution if the components are Gaussian. The debiased Wiener is not 
really adapted to any objective in particular. The GLS solution is the best 
solution if the objective is an unbiased reconstruction with no filtering and 
no contamination. In practice, it should be noted that small uncertainties on 
A result in errors (biases and contamination) even for the GLS solution. 

As a third point, we note that it can be shown straightforwardly that 
for Gaussian sources and noise, the Wiener solution maximises the posterior 
probability P(s\y) of the recovered sources given the data. From Bayes theo- 
rem, the posterior probability is the product of the likelihood P(y\s) and the 
prior P(s), normalised by the evidence P(y). The normalising factor does 
not depend on s. We can write: 

P(s\y) cx cxp [-(y - As) t R n - 1 (y - As)] cxp [-^R^s] 
log(P(s|y)) =-[(y- As)tR n - 1 (y - As)] - [^R^ 1 *] + const. (36) 

where exp s\ is the Gaussian prior for s. The requirement that 



— log(P(a|y)) = 
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implies 

-AtRn -1 ^ - As) + R s : s = 

A f R n x As + R s - x s = A^n 

[AtR n - 1 A + R s - 1 ] a = A^R^y (37) 

and thus we get the solution W^. In section 6, we will dicuss the case where 
the Gaussian prior is replaced by an entropic prior, yielding yet another 
solution for s. 

5.6 Pixels, harmonic space, or wavelets? 

The simple inversion of A using the inverse or pseudo-inverse can be imple- 
mented cquivalcntly with any representations of the maps, in pixel domain, 
harmonic space, or on any decomposition of the observations on a set of func- 
tions as, e.g., wavelet decompositions [Moudden et al. (2005)]. The result in 
terms of separation is independent of this choice, as far as the representation 
arises from a linear transformation. 

If all sources and signals are Gaussian random fields, the same is true 
for GLS or Wiener inversions, provided all the second order statistics are 
properly described by the covariance matrices R n and R s . 

These covariance matrices, in pixel space, take the form of the set of 
covariances: 

Rn = {Rn inj (Pi,Pj)} 

where 

Rn inj {jPi,Pj) = {n l {p l )n j {p 3 )) 
Similarly, in harmonic space, we have: 

Rn = {RriiTij {£i ) Wli ; £j 7 Wlj ) } 

where 

Rmnjiii^miJ^mj) = (ni(£i,mi)nj(£j,mj)) 

If the number of pixels is large, if we deal with several sources and many 
channels at the same time (tens today, thousands in a few years) , the imple- 
mentation of the GLS or Wiener solution may be quite demanding in terms 
of computing. For this reason, it is desirable to implement the solution in the 
space where matrices are the most easy to invert. 

For stationnary Gaussian random fields, harmonic space implementations 
arc much easier then direct space implementations, because the covariance 
between distinct modes vanish, so that 

R ni n j {£i-,m i ,£ j ,mj) = {n l (t l ,m i )n J (l^m i ))5i> ilj 5 mimj 

The full covariance matrix consists in a set of independent covariance matrices 
(one for each mode), each of which is a small matrix of size iV c h a nneis x 
^channels for R n , and of size 

-^sources X A^ source s for R-s- 
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Solution 


W = 


Required 
prior 
knowl- 
edge 


Comments 


Inverse 


A" 1 


A 


When there are as 
many channels of ob- 
servation as com- 
ponents. Unbiased, 
contamination free. 


Pseudo-inverse 


[At A] _1 At 


A 


When there are 
more channels of 
observation than 
components. Unbi- 
ased, contamination 
free. 


GLS 


[AtR^A] -1 A f R n 1 


A and R n 


Minimises the vari- 
ance of the error 
for deterministic sig- 
nals. Unbiased, con- 
tamination free. 


Wiener 1 


[A t R n - 1 A + R s - 1 ] _1 AtR n - 1 


A, R n and R s 


Minimises the vari- 
ance of the error for 
stochastic signals. 
Biased, not free 
of contamination. 
Tends to the GLS 
solution in the limit 
of high SNR. 


Wiener 2 


R s A f [R n + AR s At] _1 


A, R n and R s 


Equivalent to 
Wiener 1. Tends to 
the pseudo inverse 
in the limit of high 
SNR. 


Debiased Wiener 


/lR s At [R n + AR s A f ] _1 


A, R n and R s 


The diagonal matrix 
A, inverse of the di- 
agonal of WA where 
W is the Wiener so- 
lution, removes for 
each mode the fil- 
tering effect of the 
Wiener filter. Unbi- 
ased, but not con- 
tamination free. 



Table 1. Summary of linear solutions to component separation when the mixing 
matrix A is known. 
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5.7 Annoying details 

Under the assumption that the response of each detector i in the instrument 
can itself be decomposed in the product of a spectral response hi(v) and a 
frequency independent symmetrical beam Bi , the contribution of component 
j to the observation obtained with detector i is: 



where Bi^ are the coefficients of the expansion of the symmetric beam of 
detector i on Legendre polynomials. 

The mixing matrix of this new linear model is seen to include a band 
integration, assumed to first order to be independent of £, a the effect of 
a beam, which depends on I. Both can be taken into account in a linear 
inversion, if known a priori. 

6 The Maximum Entropy Method 

The Wiener filter provides the best (in terms of minimum- variance, or max- 
imum likelihood) estimate of the component maps if two main asumptions 
hold. Firstly, the observations should be a linear mixture of distinct emis- 
sions. Secondly, the components and the noise should be (possibly correlated) 
Gaussian stationnary random processes. 

Unfortunately, the sky is known to be neither Gaussian, nor stationnary, 
with the possible exception of the CMB itself. Is this critical? 

The Maximum Entropy Method (MEM) of component separation is a 
method which inverts the same linear system of component mixtures, but 
assumes non-gaussian probability distributions [Hobson et al. (1998)]. 

6.1 Maximum Entropy 

The concept of entropy in information theory has been introduced by Shan- 
non in 1948 [Shannon (1948)]. The entropy of a discrete random variable X 
on a finite set of possible values {xi} with probability distribution function 
p(xi) = p(X = x^, is defined as: 



The principle of maximum entropy is based of the idea that whenever 
there is some choice to be made about the distribution function of the random 
variable X, one should choose the least informative option possible. Entropy 
measures the amount of information in a probability distribution, and entropy 
maximisation is a way of achieving this. 





(38) 



i=i 



32 J. Delabrouille and J.-F. Cardoso 



For instance, in the absence of any prior information, the probability 
distribution which maximises the entropy of equation 38 is a distribution 
with uniform probability, p{x{) — 1/N, i.e. the least informative choice of a 
probability distribution on the finite set {xi}, where all outcomes are equally 
likely. This is the most natural choice if nothing more is said about the 
probability distribution. 

In the opposite, a most informative choice would be a probability which 
gives a certain result, (for instance always X — x\). This is a probability 
distribution which minimizes entropy 

In the continuous case where X can achieve any real value x with proba- 
bility density p(x), entropy can be defined as: 

/oo 
p(x) logp(x) dx (39) 
-oo 

Of course, maximum entropy becomes really useful when there is also ad- 
ditional information available. In this case, the entropy must be maximized 
within the constraints given by additional information. 

For instance, the maximum entropy distribution of a real random variable 
of mean \x and variance a 2 is the normal (Gaussian) distribution : 

P(X) = 2^ CXP 

For this reason, in absence of additional information about the probability 
distribution of a random variable of known mean and variance, it is quite 
natural, according to the Maximum Entropy principle, to assume a Gaussian 
distribution - which maximises the entropy, and hence corresponds to the 
least informative choice possible. 

An other useful example is the maximum entropy distribution of a real 
positive random variable of mean /z, which is the exponential distribution : 

p{x) = - exp(-x/fj) 



2a 2 



6.2 Relative entropy 

In fact, the differential entropy of equation 39 has an unpleasant property. It 
is not invariant under coordinate transformations (on spaces with more than 
one dimension). 

The definition of relative entropy (or Kullback-Leibler divergence) be- 
tween two distributions solves the issue. It can be interpreted as a measure 
of the amount of additional information one gets from knowing the actual 
(true) probability distribution p(x), instead of an imperfect model m(x), and 
is given by: 

D KL (p\\m)= ^ p(x) log ^\dx (40) 
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Later in this paper (in section 8), we will make use of the Kullback-Leibler 
divergence for measuring the "mismatch" between two positive matrices H 1 
and R 2 . It will actually correspond to the KL divergence between two Gaus- 
sian distributions with covariance matrices Ri and R 2 . 

The relative entropy is invariant under coordinate transformations (be- 
cause both the ratio p(x)/m(x) and p(x)dx are invariant under coordinate 
transformations) . 

6.3 Component separation with the MEM 

In principle, replacing the Gaussian prior by some other prior is perfectly 
legitimate. In practice, the choice of such a prior is not obvious, as the full 
statistical description of a complex astrophysical component is difficult to 
apprehend.. 

Following the maximum entropy principle, one may decide to use as a prior 
the distribution which maximises the entropy given a set of constraints. If the 
constraints are the value of the mean, and the variance, then the maximum 
entropy prior is the Gaussian prior. 

Hobson and collaborators, in their MEM paper [Hobson et al. (1998)], 
argue that based on the maximum entropy principle, an appropriate prior 
for astrophysical components s is 

p(s) = cxp [-aS c (s, m u ,m v )] (41) 

with 

S c {s, m u , m v ) = ^ \ i>3 - m uj - ™ vj - Sj In 

where ipj = [s 2 +4m u jm v j] 1 / 2 , and where m u and m v are models of two pos- 
itive additive distributions (which are not clearly specified) used to represent 
the astrophysical components. 

A derivation for this is given in [Hobson & Lasenby (1998)], but the con- 
nection to entropy is not direct. In particular, the definition of entropy does 
not require the values of the random variables to be positive, but their prob- 
ability densities, which makes the discussion unconvincing. 

Pragmatically, the choice for the prior of equation 41 seems to be vali- 
dated a posteriori by the performance of the separation, which is not worse 
(and actually better for some of the components) than that obtained with 
the Wiener filter. It is not likely to be optimal, however, because the non- 
stationnarity of components implies correlations in the harmonic domain, 
which are not fully taken into account in the MEM implementation. 

The maximisation of the posterior probability (and hence of the product 
of the likelihood and the prior), is done with a dedicated fast maximisation 



2m. 



U] 
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algorithm. We refer the reader to the relevant papers for additional details 
[Hobson & Lasenby (1998), Stolyarov et al. (2002)]. 

This method has been applied to the separation of components in the 
COBE data [Barreiro et al. (2004)]. 

6.4 Comments about the MEM 

Although entropy has a clear meaning in terms of information content in 
the discrete case (e.g. it defines the minimum number of bits necessary to 
represent a sequence), there is no such interpretation in the continuous case. 
Entropy maximisation, understood as minimising the amount of arbitrary 
information in the assumed distribution, hence, is not very clearly founded 
for continuous images. 

The "principle" of maximum entropy, as the name indicates, is not a 
theorem, but a reasonable recipe which seems to work in practice. In the 
context of the CMB, there is no guarantee that it is optimal, among all 
non-linear solutions of the mixing system. MEM outperforms the Wiener 
filter solution for some components in particular because the entropic prior of 
Hobson and Lasenby allows heavier tails than the Gaussian prior. Other priors 
however, based on a physical model of the emissions, might well perform even 
better in some cases. This question remains as an open problem in the field. 

7 ICA and Blind source separation 
7.1 About blind separation 

The term "blind separation" refers to a fascinating possibility: if the compo- 
nents of a linear mixture are statistically independent, they can be recovered 
even if the mixing matrix A is unknown a priori. In essence, this is possible 
because statistical independence is, at the same time, a strong mathematical 
property and, quite often, a physically plausible one. 

There is an obvious and strong motivation for attempting blind com- 
ponent separation: allowing underlying components to be recovered blindly 
makes it possible to analyze multi-detector data with limited, imperfect, or 
even outright missing knowledge about the emission laws of the components. 
Even better, one can process data without knowing in advance which com- 
ponents might be "out there" . Hence, the blind approach is particularly well 
suited for exploratory data analysis. 

In the last fifteen years, blind component separation has been a very 
active area of research in the signal processing community where it goes by 
the names of "blind source separation" (BSS) and "independent component 
analysis" (ICA). This section outlines the principles underlying some of the 
best known approaches to blind source separation. There is not a single best 
approach because there is not a unique way in which to express statistical 
independence on the basis of a finite number of samples. 
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7.2 Statistical independence 

This section explains why blind component separation is possible in the first 
place. For the sake of exposition, the main ideas are discussed in the sim- 
plest case: there is no observation noise and there are as many "channels" as 
underlying components. Thus the model reduces to 

y(p) = As(p) 

where A is an n x n matrix and we are looking for annxn matrix "separating 
matrix" W. Of course, if the mixing matrix A is known, there is little mystery 
about separation: one should take W = A -1 and be done with it. 

If nothing is known about A but the components are known (or assumed) 
to be statistically independent, the idea is to determine W in such a way that 
the entries of vector Wy are independent (or as independent as possible) . In 
other words, the hope is that by restoring independence, one would restore 
the components themselves. Amazingly enough, this line of attack works. 
Even better, under various circumstances, it can be shown to correspond to 
maximum likelihood estimation and there is therefore some statistical opti- 
mality to it. . . provided the hypothesis of statistical independence is expressed 
vehemently enough. 

Note however that no matter the amount of statistical ingenuity thrown 
at blind component separation, there is no hope to recover completely the 
mixing matrix (or equivalently: the components). This is because a scalar 
factor can always be exchanged between each entry of s and the corresponding 
column of A without changing what the model predicts (i.e. the value of 
the product As) and without destroying the (hypothetical) independence 
between the entries of s. The same is true of a renumbering of the columns of 
A and of the entries of s. In other words, blind recovery is possible only up to 
rescaling and permutation of the components. In many applications, this will 
be "good enough" . If these indcterminacies have to be fixed, it can be done 
only by imposing additional constraints or resorting to side information. 

For any possible choice W of a candidate separating matrix, denote 

x(p) = Wy(p) 

the corresponding vector of candidate components. If W = A -1 then the 
entries of x are independent (since, in this case x(p) = s(pj). Under which 
circumstances would the converse true? Whenever the converse is true, it 
will be possible to recover the sources by looking for the linear transform W 
which makes them independent. Hence, we have a blind separation principle: 
to separate components, make them independent. 
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7.3 Correlations 

The main difficulty in blind source separation is to define a measure of inde- 
pendence. The problem is that the simple decorrelation condition 4 between 
any two candidate components: 

I p 

pJ2 Xi ( p ) = for l < l ^i< n - ( 42 ) 
p=i 

does not cut it. This is in fact obvious from the fact that this decorrela- 
tion condition between Xi and Xj is symmetric. Hence decorrelation provides 
only n(n — l)/2 constraints while n 2 constraints are needed to determine W 
uniquely. Therefore, more expressive forms of independence must be used. 
Two main avenues are possible: non-linear correlations and localized correla- 
tions, as described next. 

Non-linear correlations 

The "historical approach" to blind separation has been to determine a sepa- 
rating matrix W in order to obtain "non- linear decorrelations" i.e. 

1 p 

p^Vi(^))z»=0 for \<i+j<n (43) 
P =i 

where functions tpi, . . . , ip n : R i— > R are non-linear functions (more about 
choosing them below). By using non- linear functions, symmetry is broken 
and the required number of constraints is obtained, namely n(n — 1) (with 
n additional arbitrary constraints, needed for fixing the scale of each compo- 
nent. 

Localized correlations 

Another approach is to look for "localized decorrelation" in the sense of 
solving 

^E^W =0 for l <^3< n ( 44 ) 

p=l 

where for each component i, a sequence {o~i P }p =1 of positive number must be 
defined (more about this soon). Again, blind identification is possible because 
symmetry is broken, provided no two sequences of ct's are proportional. 



4 Here, as in the rest of this section, all signals are assumed to have zero mean. 
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Maximum likelihood 

Why using the particular proposals (43) or (44) as extended decorrelation 
conditions rather than any other form, possibly more complicated? One 
reason is that reasonable algorithms exist for computing the W such that 
x = Wy is a solution of (43) or (44). Another, more important reason is 
that these two conditions actually characterize the maximum likelihood es- 
timate of W in simple and well understood models. Because of this, we can 
understand what the algorithm does and we have guidance for choosing the 
non-linear functions tpi in condition (43) or the varying variance profiles of 
in condition (44) as stated next. 

Non linear correlations. Assume that each component {s,(p)} is mod- 
eled as having all pixels independently and identically distributed according 
to some probability density pi. In this model, the most likely value of A 
given the observations has for inverse a matrix W such that condition (43) 
holds with ipi — —p'i/pi- Hence, if the model is true (or approximately true), 
the non linear-function appearing in condition (43) should be taken as minus 
the derivative of the log-density of Si{p). For a Gaussian distribution pi, the 
corresponding function tpi is linear: here, the necessary non-linearity of ipi 
corresponds to the non Gaussianity of the corresponding component. 

Localized correlations. Alternatively, one may model each component 
{si(p)} as having all pixels independently and normally distributed with 
zero- mean and "local" variance af p . Then, in this model, the likeliest value 
of A given the observations has for inverse a matrix W such that x = Wy 
satisfies condition (44). 

7.4 ICA in practice 

For the simple noise-free setting under consideration (the noisy case is ad- 
dressed in next section), the algorithmic solutions depend on the type of 
decorrelation one decides to use. 

Non linear decorrelation 

Two popular ICA algorithms based on non-linear decorrelation (hence ex- 
ploiting non Gaussianity) are JADE [Cardoso (1999)] and FastICA [Hyvarinen 
In practice however, these algorithms do not exactly solve an equation in the 
form (43). Rather, for algorithmic efficiency, they try to solve it under the 
additional constraint that the components are uncorrelated i.e. that condi- 
tion (42) is satisfied exactly. The underlying optimization engine is a joint 
diagonalization algorithm for JADE and a fixed point technique for FastICA. 

Localized decorrelation 

Efficient algorithms for solving the localized decorrelation conditions (44) 
are based on assuming some regularity in the variance profiles: the sequences 
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{°fp} are approximated as being constant over small domains. Hence, the 
global set [1, P] is partitioned into Q subsets I\, . . . ,1q, each containing a 
number P q of points (so that P = J2q=i Pq)- I n practice, these pixel subsets 
are (well chosen) spatial regions. With a slight abuse of notation, we write 
of p = of if p S l q . Then, a small amount of maths turns the decorrelation 
conditions (44) into 



.9~ 



^P,S- 1 WR,Wt 



= for 1 < i ^ j < n (45) 



where R g is a localized covariance matrix 



R 9 = i-^y(p) y (p)t and E, = diag((^,,...,<7^). (46) 
H p£q 

An important point here is that by assuming piecewise constant variance pro- 
files, the localized decorrelation condition can be expressed entirely in terms 
of the localized covariance matrices R 9 . Hence the localized covariance ma- 
trices appear as sufficient statistics in this model. Even better, the likelihood 
of A can be understood as a mismatch between these statistics and their pre- 
dicted form, namely R 9 = AS 9 A^. Specifically, in this model the probability 
p(y(l), • • • y(P)|A, S) of the data given A and the set S = {Si, . . . , Sq} of 
covariance matrices is given by 

logp(y(l), . . . y(P)|A, E) = -<f>(A, S) + est 

where function <f> is defined as 

0(A, E) = P q K(K AS 9 At) (47) 
q 

and where K(-, ■) is a measure of divergence between two matrices defined as 
A"(Ri,R 2 ) = ^ (tracc(RiR 2 " 1 ) - logdet(RiR 2 " 1 ) - n) (48) 

This shows that maximum likelihood estimation of A amounts to the min- 
imization of the weighted mismatch (47) between the set of localized co- 
variance matrices R 9 (computed from the data) and their expected value 
Rq = AE 9 A^ (predicted by the model). 

In the noise-free case considered here, it turns out that there is a simple 
and very efficient algorithm (due to D.T. Pham) for minimizing the spectral 
mismatch. 
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8 SMICA 

We have developed a component separation technique dubbed SMICA for 
'spectral matching ICA' which is based on the ideas sketched at previous 
section but improves on them in several ways. 

In its simplest form, SMICA is based on spectral statistics, that is, on 
statistics which are localized not in space but in frequency. These statistics 
are binned auto- and cross-spectra of the channels. More specifically, for a 
given set of A c hann multi-channels maps {yi(p)}, we form for each (£,m) the 
-^chann x 1 vector y(£, to) of their harmonic coefficients and define 

m=+i 

fiw = 27TT £ y&™)y&™? 

m=—i 

These empirical spectral covariance matrices are then binned. In the sim- 
plest case, we define Q top-hat bins, with the q-th. frequency bin contains all 
frequencies i between ^™ ln and €™ ax . We consider the binned spectra: 

R q = y (2£+l)R(£) where P q = (2£+l). 

Here P q is the number of Fourier modes summed together in a single estimate 

The mixture model y = As + n predicts that the empirical spectra R<j 
have an expected value 

K q = (R 9 ) = AS^At + Ng 

where S g are the binned spectral covariance matrix for the components in 
bin q and N g is the same for noise, assumed to be uncorrelated from the 
components. The unknown parameters can be collected in a big vector 9: 

= {A,{SJ,{NJ} 

but in practice we will not fit such a large model. Many constraints can be 
imposed on 9. A typical choice is to assume that the components are uncor- 
related between themselves and that the noise also is uncorrelated between 
channels. Such a choice would result in a smaller parameter set 

6 = {A, {diagSj, {diagN J} 

but infinitely many other options are possible, both more stringent (like as- 
suming that the noise in each channel is a smooth function of the bin index 
q) or less stringent (like assuming that some components may not be un- 
correlated). In the following, we do not assume a specific parametrization of 
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the binned spectral covariance matrices. Rather, we denote where 9 is some 
parameter set which uniquely determines the values of A and each R 9 and 



SMICA determines the set 9 of unknown parameters by fitting the empirical 
spectral covariance matrices to whichever structure is predicted by the model. 
Specifically, the unknown parameters are found by minimizing the "spectral 
mismatch" 



averaged across bins. Some comments are in order regarding the matching 
criterion, the issue of non stationarity and practical implementation. 

Matching criterion 

The reason for choosing this particular form of mismatch between data and 
model is that minimizing (49) is identical to maximizing the likelihood of 
the data in a model where all components are 1) Gaussian 2) stationary 
and 3) have harmonic spectra which are constant over bins. Of course, these 
assumptions are not met in practice so one could choose a different criterion 
for matching R g to R q (0) but we have little statistical guidance for picking up 
an alternate matching measure. Furthermore, the assumptions 1) and 2) are 
met by the CMB and 3) is approximately correct for narrow bins. In addition, 
the failure of stationarity can be alleviated by using localized statistics (see 
below) . 

Non stationarity and localization 

The spectral approach to building a likelihood function has some benefits, 
in particular the fact that it is perfectly suited to describing the statistical 
properties of the CMB. Another beneficial side effect is that it makes it easy 
to deal with varying resolution from channel to channel as long as the beam 
can be considered to be symmetrical 

However, going straight away to harmonic space seems unreasonable to 
deal with highly non stationary components such as the galactic components. 
This issue can be addressed to some extent by resorting to localized spectral 
statistics. It is a simple matter to use spatial window functions to partition 
the sky into spatial domains [Cardoso et al. (2005)]. Although not a perfect 
solution, it certainly allows to capture a good deal of the non stationary 
features of the galactic sky. 

Implementation 




{RJ = {R q (9}} = {A(0)E,(0)A(0)t + N q (9)} 




(49) 



The definition of the spectral matching criterion (49) encapsulates all of the 
statistical modeling but leaves open the separate and possibly tricky issue 
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of minimizing <j)(8). 5 Because the criterion is a likelihood in disguise, it is 
possible to use the EM algorithm for its minimization, with the components 
taken as latent variables. However, EM is often not fast enough and also is not 
able to deal with arbitrary parametrization of and N q (9). It has been 

found necessary to use general optimization techniques. A conjugate gradient 
algorithm can be implemented because a reasonably tractable expression for 
the gradient of the criterion is available as: 

f Q = £P 9 trace (r,(*)-1(R,(*) - R,)R, W -i^> 

However, in our context, the conjugate gradient algorithm also requires pre- 
conditioning. A preconditioner can be classically obtained as the inverse of 
the Fisher information matrix FIM(#) which is taken as an approximation to 
the Hessian of 4>(0): 

{6) -^e~ 



H « FIM(0) = £P 9 trace (r^-^R, 



g 

Mismatch control and error bars 

A benefit of the SMICA approach is that it comes with a built-in measure of 
the quality of the model. Indeed, if we properly fit all the auto-cross spectra, 
then <j>(9) should be 'statistically small'. Visual control of the quality of the 
spectral matching is obtained by plotting <j> q = P q K (R q , H q (6)) versus q 
where 9 is the minimizer of (f)(9). This quantity should be understood as 
a% 2 . If the model holds (Gaussian stationary components and noise) and 
when all spectral parameters are freely estimated <f> q behaves approximately 
as a x 2 with a number of degrees of freedom equal to N\ — N2 where N\ = 
A^channCA^chann + l)/2 is the number of degrees of freedom for an sample 
covariance matrix of size A c h an n and where A2 = A comp + A c h a nn is the 
number of adjustable spectral parameters (the variances of each components 
and noise levels in a given frequency bin). 



9 Other blind, semi-blind, or model learning methods 

This paper would not be complete without a quick review of some of the 
recent work. We quote here a few papers which we think deserve reading for 

5 We note in passing that some authors seem to make a confusion between the 
objective function (the criterion which has to be minimised, which derives from 
a statistical model) and the algorithm used for minimization. For instance, some 
authors use the terms "EM method" , or "MCMC method" , to design a method 
in which they use the EM algorithm, or Monte-Carlo Markov Chains. This is 
rather infortunate, and contributes to a certain level of confusion. 
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further exploration of component separation issues and methods. Although 
unevenly mature, these methods provide complementary approaches, with 
advantages ad drawbacks which deserve to be investigated. 

9.1 FastICA 

A blind component separation based on the FastICA method has been devel- 
oped for CMB data reduction by Baccigalupi et al [Baccigalupi et al.(2000)], 
with an extension to the full sky by Maino et al [Maino et al.(2002)]. This 
blind approach uses, as "engine" for component separation, a measure of in- 
dependence based on non-Gaussianity. Therefore, it is essentially equivalent 
to finding components which cancel non-linear correlations in the sense of 
equation 43. 

For CMB applications, characterizing independence via non linear cor- 
relations of the form 43 has some limitations. Firstly, theory shows that 
this characterization allow for the separation of at most one Gaussian com- 
ponent [Cardoso (1998)]. The Gaussian component is somehow found "by 
default" , as the particular component which is orthogonal to (uncorrelated 
with) all other non Gaussian components. This is a concern for compo- 
nent separation performed with the CMB as the main target. Secondly, the 
non- linear decorrelation conditions do not take the noise into account. Even 
though this can be fixed in some ad hoc fashion, it is computationally de- 
manding to do it in maximum likelihood sense. Finally, pixel space imple- 
mentations cannot easily handle channel-dependent beams (unless explicit 
beam deconvolution is performed). If, to circumvent this problem, one con- 
siders harmonic space implementation, performance suffers from the fact that 
Fourier tend to be more Gaussian than the original, pixel-domain maps. 

FastICA, however, can outperform other component separation methods 
for some applications. Spectral based methods (like SMICA) cannot blindly 
separate two components if their angular power spectra are proportional. 
FastICA docs not suffer from this limitation and therefore has an edge for 
separating galactic components. If all galactic components have similar power 
spectra (say, proportional to then SMICA is expected to perform poorly 
without prior information. 

Although both FastICA and SMICA arc blind methods entering in the 
general class of "independent component analysis" , it should thus be stressed 
that they are conceptually very different. Performance, therefore, is expected 
to be very different also, and to depend on the actual properties of the data 
sets. 

FastICA has been used on COBE and on WMAP data [Maino et al.(2003), 
Maino et al.(2007)]. 
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9.2 Other recent developments 

A "semi-blind" approach to component separation has been proposed by 
Tegmark and collaborators in a work where they model the foreground emis- 
sions using a number of physical parameters, which they estimate directly in 
the data sets [Tegmark et al.(2000)]. They estimate the impact of estimating 
these extra parameters in terms of accuracy loss on parameters of interest 
for CMB science. This paper was the first to address seriously the problem 
of component spectral indices varying over the sky. 

Martinez-Gonzalez and collaborators have proposed a method for the ex- 
traction of the CMB specifically and for the estimation of its power spectrum 
[Martinez-Gonzalez et al.(2003)]. The EM algorithm is the main tool of the 
implementation. 

Eriksen and collaborators have developed a method based on a refined 
modeling of the astrophysical components, and fitting this model to the data 
to obtained estimates of foreground parameters [Eriksen et al. (2006)]. The 
fit of the parameters is made pixel by pixel at low-resolution using a MCMC 
techinquc for exploring the likelihood. After this first "model learning" step, 
the parameters obtained arc used to estimate high resolution component 
maps. 

Recently, Hansen and collaborators have proposed a CMB cleaning method 
based on a wavelet fit of component emissions obtained by differencing ob- 
servations in different channels, and subtraction of the fit from observations 
made at frequencies where the CMB dominates [Hansen et al.(2006)]. 

Bonaldi and collaborators have recently published a paper for estimat- 
ing parameters of emission of astrophysical components (emission laws, de- 
scribed by spectral indices). The statistics used are based on estimations of 
the correlations of the observations using a subset of points on the sphere 
[Bonaldi et al.(2006)]. 

An alternate way of performing component separation has been proposed 
by Bobin and collaborators, based on sparse representations of the various 
emissions [Bobin (2006)]. The basic principle of this method consists in de- 
composing the observations in a set of (redundant) dictionnaries chosen so 
that each component can be represented sparsely in one of the dictionnaries. 
Separation is achieved by minimizing the number of coefficients required to 
represent the data set. 

A comparison of these different methods on a common data set, for in- 
vestigating their strengths and weaknesses and evaluating their relative per- 
formance for various objectives would be an interesting work to improve the 
quality of component separation with the data set of upcoming space mis- 
sions. 
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10 Conclusion and prospects 

With improving data quality and increasingly demanding performance in 
component characterisation, component separation will play an important 
role in the analysis of CMB data sets in the next decade. 

In this paper, we have reviewed the main issues for component separation, 
concentrating on diffuse components specifically. 

Although substantial work has been performed, open questions remain. 
Polarisation, for instance, is one of the next major objectives of CMB science, 
for which much better sensitivities are required, and for which foreground 
emission is poorly known... Time varying sources, as the emission due to 
zodiacal light (modulated by the trajectory of the instrument in the ecliptic), 
as solar system objects in general, and as intrinsically time-varying radio 
sources, require specific methods tailored for their extraction. 

The upcoming Planck data set, expected to become available to the 
Planck consortium in 2008, will provide a fantastic and challenging data set 
for extracting the emission from all astrophysical processes emitting in the 
millimeter range. 
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